ABSTRACT
As the COVID-19 pandemic fundamentally reshaped the remote life and working styles, Voice over IP (VoIP) telephony and video conferencing have become a primary method of connecting communities together. However, little has been done to understand the feasibility and limitations of delivering adversarial voice samples via such communication channels. In this paper, we propose TAINT-Targeted Adversarial Voice over IP Network, the first targeted, query-efficient, hard label black-box, adversarial attack on commercial speech recognition platforms over VoIP. The unique channel characteristics of VoIP pose significant new challenges, such as signal degradation, random channel noise, frequency selectivity, etc. To address these challenges, we systematically analyze the structure and channel characteristics of VoIP through reverse engineering. A noise-resilient efficient gradient estimation method is then developed to ensure a steady and fast convergence of the adversarial sample generation process. We demonstrate our attack in both over-the-air and over-the-line settings on four commercial automatic speech recognition (ASR) systems over the five most popular VoIP Conferencing Software (VCS). We show that TAINT can achieve performance that is comparable to the existing methods even with the addition of VoIP channel. Even in the most challenging scenario where there is an active speaker in Zoom, TAINT can still succeed within 10 attempts while staying out of the speaker focus of the video conference. © 2022 Owner/Author.
ABSTRACT
Videoconferencing applications have seen a jump in their userbase owing to the COVID-19 pandemic. The security of these applications has certainly been a hot topic since millions of VoIP users’ data is involved. However, research pertaining to VoIP forensics is still limited to Skype and Zoom. This paper presents a detailed forensic analysis of Microsoft Teams, one of the top 3 videoconferencing applications, in the areas of memory, disk-space and network forensics. Extracted artifacts include critical user data, such as emails, user account information, profile photos, exchanged (including deleted) messages, exchanged text/media files, timestamps and Advanced Encryption Standard encryption keys. The encrypted network traffic is investigated to reconstruct client-server connections involved in a Microsoft Teams meeting with IP addresses, timestamps and digital certificates. The conducted analysis demonstrates that, with strong security mechanisms in place, user data can still be extracted from a client’s desktop. The artifacts also serve as digital evidence in the court of Law, in addition to providing forensic analysts a reference for cases involving Microsoft Teams. © 2022, ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering.